I compared coverage profiles and estimated abundances for CAR and related transcripts using three sets of samples:
Data presented below was generated and/or compiled from several sources:
sample_metrics_data: sample annotation (e.g., donor ID, timepoint, etc.) as well sequencing & alignment metrics from RNA-seq processingsample_rapmap_data: read coverage measured across the length of the CAR transcript, based on mapping with the RapMap toolsample_salmon_data: abundance estimates (e.g., TPM, count) produced by the Salmon tool for CAR and several relevant transcripts when mapping to a modified human reference transcriptome (hg38)salmon_imgt_data: predicted/identified TCR junction sequences and alleles in single-cell libraries, as produced by assembly with Trinity followed by matching with IMGT High V-QUESTgff_file: custom-built GTF file describing where individual segments are located along the length of the CAR transcriptload("data/sample_metrics_data.RData")
load("data/sample_rapmap_data.RData")
load("data/sample_salmon_data.RData")
load("data/sample_tcr_data.RData")
gff_file <- "data/annotation/carPlus.gtf"
xcripts_gtf <- import.gff2(gff_file)
The plots below show read coverage from RapMap mapping across the length of the CAR sequence. Segments in the transcript, corresponding to the gene parts used to build the construct, are depicted by colored boxes in each plot. Transparency (i.e., alpha) is scaled based on the estimated abundance of the CAR transcript (TPM) as measured by Salmon.
To simplify the plots (and make it easier to distinguish between libraries), I’ll just plot the fitted line of coverage for each library.
pass:
median_cv_coverage< 1 ANDmapped_reads_w_dups> 0.7
junction: functional TRA OR TRB junction sequence detected
TCR: paired TRA AND TRB for the same library
The following human transcripts (which overlap the CAR sequence) were quantified by Salmon.
| xcript_name | segment_version |
|---|---|
| CSF2 | GMCSFRss_r1 |
| CD28 | CD28tm_r1 |
| CD28 | CD28tm_r2 |
| CD28 | CD28tm_r3 |
| TNFRSF9 | IgG4hinge_r1 |
| CD247 | CD3Zeta_r1 |
| CD247 | CD3Zeta_r2 |
| EGFR | EGFRt_r1 |
| EGFR | EGFRt_r2 |
| EGFR | EGFRt_r3 |
| EGFR | EGFRt_r4 |
| EGFR | EGFRt_r5 |
Each plot shows CAR coverage across libraries, but colored based on the estimated abundance of the respective transcript.
I tried to come up with a relatively simple way to classify whether CAR was detected in a particular library.
expressed: log2(TPM +1) \(\gt\) 0 for CAR transcript
| car_expr_tpm | nz_cov | n_libs |
|---|---|---|
| car_expr_tpm | FALSE | 18 |
| car_expr_tpm | TRUE | 55 |
| car_expr_tpm | NA | 1 |
| no_car_tpm | FALSE | 444 |
| no_car_tpm | TRUE | 37 |
| no_car_tpm | NA | 4 |
expressed: log2(TPM + 1) \(\geq\) 2.5 in CAR or EGFRt transcripts OR log2(TPM + 1) \(\gt\) 2 in all CAR or EGFRt transcripts
| car_expr_quant | nz_cov | n_libs |
|---|---|---|
| car_expr_quant | FALSE | 107 |
| car_expr_quant | TRUE | 58 |
| car_expr_quant | NA | 2 |
| no_car_quant | FALSE | 355 |
| no_car_quant | TRUE | 34 |
| no_car_quant | NA | 3 |
expressed: \(\geq\) 10 positions with \(\gt\) 0 reads in ANY of CD19scFv, T2A, or EGFRt
| car_expr_cov | nz_cov | n_libs |
|---|---|---|
| car_expr_cov | FALSE | 2 |
| car_expr_cov | TRUE | 53 |
| no_car_cov | FALSE | 460 |
| no_car_cov | TRUE | 39 |
| NA | NA | 5 |
Plotting the remaining 53 libraries with CAR detected based on coverage.
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.4 (unknown)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] cowplot_0.6.1 scales_0.4.0 ggthemes_3.0.2
## [4] rtracklayer_1.30.4 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3
## [7] IRanges_2.4.8 S4Vectors_0.8.11 BiocGenerics_0.16.1
## [10] viridis_0.3.4 ggplot2_2.1.0 dplyr_0.4.3
## [13] tidyr_0.4.1 stringr_1.0.0 knitr_1.12.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.4 highr_0.5.1
## [3] futile.logger_1.4.1 formatR_1.3
## [5] plyr_1.8.3 XVector_0.10.0
## [7] futile.options_1.0.0 bitops_1.0-6
## [9] tools_3.2.1 zlibbioc_1.16.0
## [11] digest_0.6.9 lattice_0.20-33
## [13] nlme_3.1-126 evaluate_0.8.3
## [15] gtable_0.2.0 mgcv_1.8-12
## [17] Matrix_1.2-4 DBI_0.3.1
## [19] yaml_2.1.13 gridExtra_2.2.1
## [21] Biostrings_2.38.4 grid_3.2.1
## [23] Biobase_2.30.0 R6_2.1.2
## [25] XML_3.98-1.4 BiocParallel_1.4.3
## [27] rmarkdown_0.9.5 reshape2_1.4.1
## [29] lambda.r_1.1.7 magrittr_1.5
## [31] GenomicAlignments_1.6.3 Rsamtools_1.22.0
## [33] htmltools_0.3.5 SummarizedExperiment_1.0.2
## [35] assertthat_0.1 colorspace_1.2-6
## [37] labeling_0.3 stringi_1.0-1
## [39] lazyeval_0.1.10 RCurl_1.95-4.8
## [41] munsell_0.4.3